Overview

Dataset statistics

Number of variables15
Number of observations114984
Missing cells4
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory19.5 MiB
Average record size in memory178.0 B

Variable types

NUM13
CAT1
BOOL1

Warnings

product_category_name_english has a high cardinality: 71 distinct values High cardinality
df_index has unique values Unique

Reproduction

Analysis started2020-09-12 15:17:42.410387
Analysis finished2020-09-12 15:19:09.838029
Duration1 minute and 27.43 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct114984
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean58212.20914
Minimum0
Maximum116580
Zeros1
Zeros (%)< 0.1%
Memory size898.4 KiB

Quantile statistics

Minimum0
5-th percentile5815.15
Q129083.75
median58178.5
Q387345.25
95-th percentile110678.85
Maximum116580
Range116580
Interquartile range (IQR)58261.5

Descriptive statistics

Standard deviation33635.58232
Coefficient of variation (CV)0.5778097554
Kurtosis-1.199864531
Mean58212.20914
Median Absolute Deviation (MAD)29132.5
Skewness0.001877420489
Sum6693472656
Variance1131352398
MonotocityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
20471< 0.1%
 
1142051< 0.1%
 
361551< 0.1%
 
463961< 0.1%
 
484451< 0.1%
 
423021< 0.1%
 
443511< 0.1%
 
874241< 0.1%
 
894731< 0.1%
 
833301< 0.1%
 
853791< 0.1%
 
956201< 0.1%
 
976691< 0.1%
 
915261< 0.1%
 
935751< 0.1%
 
710481< 0.1%
 
730971< 0.1%
 
669541< 0.1%
 
690031< 0.1%
 
792441< 0.1%
 
812931< 0.1%
 
751501< 0.1%
 
771991< 0.1%
 
1161141< 0.1%
 
1038321< 0.1%
 
Other values (114959)114959> 99.9%
 
ValueCountFrequency (%) 
01< 0.1%
 
11< 0.1%
 
21< 0.1%
 
31< 0.1%
 
41< 0.1%
 
51< 0.1%
 
61< 0.1%
 
71< 0.1%
 
81< 0.1%
 
91< 0.1%
 
ValueCountFrequency (%) 
1165801< 0.1%
 
1165791< 0.1%
 
1165781< 0.1%
 
1165771< 0.1%
 
1165761< 0.1%
 
1165751< 0.1%
 
1165741< 0.1%
 
1165731< 0.1%
 
1165721< 0.1%
 
1165711< 0.1%
 

price
Real number (ℝ≥0)

Distinct5844
Distinct (%)5.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean120.4722953
Minimum0.85
Maximum6735
Zeros0
Zeros (%)0.0%
Memory size898.4 KiB

Quantile statistics

Minimum0.85
5-th percentile17
Q139.9
median74.9
Q3134
95-th percentile349.9
Maximum6735
Range6734.15
Interquartile range (IQR)94.1

Descriptive statistics

Standard deviation183.8195953
Coefficient of variation (CV)1.525824629
Kurtosis120.288931
Mean120.4722953
Median Absolute Deviation (MAD)41.91
Skewness7.91372638
Sum13852386.4
Variance33789.6436
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
59.925722.2%
 
69.920771.8%
 
49.920081.7%
 
89.916061.4%
 
99.914971.3%
 
29.913581.2%
 
39.913231.2%
 
19.912721.1%
 
79.912591.1%
 
29.9912091.1%
 
4911911.0%
 
999980.9%
 
149.98930.8%
 
109.98160.7%
 
119.97800.7%
 
99.997370.6%
 
24.97040.6%
 
39.996920.6%
 
356890.6%
 
49.996780.6%
 
34.96610.6%
 
89.996580.6%
 
796500.6%
 
129.96480.6%
 
56.996400.6%
 
Other values (5819)8736876.0%
 
ValueCountFrequency (%) 
0.853< 0.1%
 
1.220< 0.1%
 
2.22< 0.1%
 
2.291< 0.1%
 
2.91< 0.1%
 
2.991< 0.1%
 
3.063< 0.1%
 
3.493< 0.1%
 
3.56< 0.1%
 
3.541< 0.1%
 
ValueCountFrequency (%) 
67351< 0.1%
 
67291< 0.1%
 
64991< 0.1%
 
47991< 0.1%
 
46901< 0.1%
 
45901< 0.1%
 
4399.871< 0.1%
 
4099.991< 0.1%
 
40591< 0.1%
 
3999.91< 0.1%
 

freight_value
Real number (ℝ≥0)

Distinct6928
Distinct (%)6.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20.01331864
Minimum0
Maximum409.68
Zeros386
Zeros (%)0.3%
Memory size898.4 KiB

Quantile statistics

Minimum0
5-th percentile7.78
Q113.08
median16.31
Q321.19
95-th percentile45.2
Maximum409.68
Range409.68
Interquartile range (IQR)8.11

Descriptive statistics

Standard deviation15.75213232
Coefficient of variation (CV)0.7870824725
Kurtosis58.32163924
Mean20.01331864
Median Absolute Deviation (MAD)3.62
Skewness5.551580213
Sum2301211.43
Variance248.1296725
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
15.137423.3%
 
7.7822822.0%
 
11.8519421.7%
 
14.119111.7%
 
18.2315901.4%
 
7.3915451.3%
 
16.1111861.0%
 
15.2310380.9%
 
8.729310.8%
 
16.798990.8%
 
14.528480.7%
 
12.798140.7%
 
10.967140.6%
 
9.346870.6%
 
17.66300.5%
 
12.696180.5%
 
17.676050.5%
 
15.114550.4%
 
11.734470.4%
 
12.484380.4%
 
13.374230.4%
 
17.634170.4%
 
8.884130.4%
 
15.794090.4%
 
19.324040.4%
 
Other values (6903)8959677.9%
 
ValueCountFrequency (%) 
03860.3%
 
0.014< 0.1%
 
0.023< 0.1%
 
0.0314< 0.1%
 
0.044< 0.1%
 
0.053< 0.1%
 
0.0613< 0.1%
 
0.071< 0.1%
 
0.0812< 0.1%
 
0.096< 0.1%
 
ValueCountFrequency (%) 
409.681< 0.1%
 
375.282< 0.1%
 
339.591< 0.1%
 
338.31< 0.1%
 
322.11< 0.1%
 
321.881< 0.1%
 
321.461< 0.1%
 
317.471< 0.1%
 
314.41< 0.1%
 
314.021< 0.1%
 

review_score
Real number (ℝ≥0)

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.047937104
Minimum1
Maximum5
Zeros0
Zeros (%)0.0%
Memory size898.4 KiB

Quantile statistics

Minimum1
5-th percentile1
Q14
median5
Q35
95-th percentile5
Maximum5
Range4
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.374089006
Coefficient of variation (CV)0.3394541395
Kurtosis0.286038152
Mean4.047937104
Median Absolute Deviation (MAD)0
Skewness-1.295852661
Sum465448
Variance1.888120598
MonotocityNot monotonic
Histogram with fixed size bins (bins=5)
ValueCountFrequency (%) 
56533056.8%
 
42191319.1%
 
11403212.2%
 
396968.4%
 
240133.5%
 
ValueCountFrequency (%) 
11403212.2%
 
240133.5%
 
396968.4%
 
42191319.1%
 
56533056.8%
 
ValueCountFrequency (%) 
56533056.8%
 
42191319.1%
 
396968.4%
 
240133.5%
 
11403212.2%
 

product_photos_qty
Real number (ℝ≥0)

Distinct19
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.205158979
Minimum1
Maximum20
Zeros0
Zeros (%)0.0%
Memory size898.4 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q33
95-th percentile6
Maximum20
Range19
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.717360052
Coefficient of variation (CV)0.7787919461
Kurtosis4.820190098
Mean2.205158979
Median Absolute Deviation (MAD)0
Skewness1.908646709
Sum253558
Variance2.94932555
MonotocityNot monotonic
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%) 
15813550.6%
 
22273619.8%
 
31281011.1%
 
487297.6%
 
555204.8%
 
638953.4%
 
715361.3%
 
87650.7%
 
103470.3%
 
93140.3%
 
11730.1%
 
12590.1%
 
1330< 0.1%
 
1711< 0.1%
 
1511< 0.1%
 
146< 0.1%
 
184< 0.1%
 
192< 0.1%
 
201< 0.1%
 
ValueCountFrequency (%) 
15813550.6%
 
22273619.8%
 
31281011.1%
 
487297.6%
 
555204.8%
 
638953.4%
 
715361.3%
 
87650.7%
 
93140.3%
 
103470.3%
 
ValueCountFrequency (%) 
201< 0.1%
 
192< 0.1%
 
184< 0.1%
 
1711< 0.1%
 
1511< 0.1%
 
146< 0.1%
 
1330< 0.1%
 
12590.1%
 
11730.1%
 
103470.3%
 

product_weight_g
Real number (ℝ≥0)

Distinct2188
Distinct (%)1.9%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean2109.384222
Minimum0
Maximum40425
Zeros8
Zeros (%)< 0.1%
Memory size898.4 KiB

Quantile statistics

Minimum0
5-th percentile125
Q1300
median700
Q31800
95-th percentile9800
Maximum40425
Range40425
Interquartile range (IQR)1500

Descriptive statistics

Standard deviation3773.414578
Coefficient of variation (CV)1.788870201
Kurtosis16.14513512
Mean2109.384222
Median Absolute Deviation (MAD)500
Skewness3.590179266
Sum242543326
Variance14238657.57
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
20068035.9%
 
15053164.6%
 
25046474.0%
 
30042813.7%
 
10035323.1%
 
40034613.0%
 
35032012.8%
 
60027702.4%
 
50027582.4%
 
70020991.8%
 
80018581.6%
 
45017951.6%
 
55016841.5%
 
90014771.3%
 
100013941.2%
 
150013681.2%
 
120012951.1%
 
85012921.1%
 
65012581.1%
 
140011481.0%
 
75011091.0%
 
95011071.0%
 
110010740.9%
 
155010540.9%
 
10509990.9%
 
Other values (2163)5620348.9%
 
ValueCountFrequency (%) 
08< 0.1%
 
25< 0.1%
 
253< 0.1%
 
509540.8%
 
532< 0.1%
 
542< 0.1%
 
551< 0.1%
 
581< 0.1%
 
607< 0.1%
 
615< 0.1%
 
ValueCountFrequency (%) 
404253< 0.1%
 
300002960.3%
 
298001< 0.1%
 
297501< 0.1%
 
297003< 0.1%
 
296005< 0.1%
 
295002< 0.1%
 
292501< 0.1%
 
291501< 0.1%
 
291001< 0.1%
 

product_length_cm
Real number (ℝ≥0)

Distinct99
Distinct (%)0.1%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean30.29019072
Minimum7
Maximum105
Zeros0
Zeros (%)0.0%
Memory size898.4 KiB

Quantile statistics

Minimum7
5-th percentile16
Q118
median25
Q338
95-th percentile62
Maximum105
Range98
Interquartile range (IQR)20

Descriptive statistics

Standard deviation16.17109963
Coefficient of variation (CV)0.5338724929
Kurtosis3.64605155
Mean30.29019072
Median Absolute Deviation (MAD)8
Skewness1.737664285
Sum3482857
Variance261.5044634
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
161764215.3%
 
20105449.2%
 
3077866.8%
 
1760925.3%
 
1857595.0%
 
1948164.2%
 
2547464.1%
 
4042203.7%
 
2239143.4%
 
5030672.7%
 
3529862.6%
 
2124362.1%
 
4524272.1%
 
2323362.0%
 
2618761.6%
 
2817741.5%
 
4217451.5%
 
2417101.5%
 
6017011.5%
 
2714991.3%
 
3314721.3%
 
3614541.3%
 
3213511.2%
 
3713331.2%
 
3413271.2%
 
Other values (74)1897016.5%
 
ValueCountFrequency (%) 
732< 0.1%
 
82< 0.1%
 
94< 0.1%
 
108< 0.1%
 
11960.1%
 
1241< 0.1%
 
13580.1%
 
141370.1%
 
152150.2%
 
161764215.3%
 
ValueCountFrequency (%) 
1053220.3%
 
10435< 0.1%
 
10345< 0.1%
 
102600.1%
 
1011080.1%
 
1003940.3%
 
9936< 0.1%
 
9849< 0.1%
 
9711< 0.1%
 
968< 0.1%
 

product_height_cm
Real number (ℝ≥0)

Distinct102
Distinct (%)0.1%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean16.62439665
Minimum2
Maximum105
Zeros0
Zeros (%)0.0%
Memory size898.4 KiB

Quantile statistics

Minimum2
5-th percentile3
Q18
median13
Q320
95-th percentile45
Maximum105
Range103
Interquartile range (IQR)12

Descriptive statistics

Standard deviation13.45599126
Coefficient of variation (CV)0.8094123079
Kurtosis7.295780169
Mean16.62439665
Median Absolute Deviation (MAD)6
Skewness2.243686579
Sum1911523
Variance181.0637008
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
10101548.8%
 
2067545.9%
 
1567415.9%
 
1162595.4%
 
1261865.4%
 
250674.4%
 
447504.1%
 
847494.1%
 
1646534.0%
 
546104.0%
 
741953.6%
 
1340503.5%
 
1436333.2%
 
3035703.1%
 
634923.0%
 
933612.9%
 
2533252.9%
 
2231942.8%
 
327652.4%
 
1823522.0%
 
1718701.6%
 
3517231.5%
 
1915181.3%
 
2113031.1%
 
4010770.9%
 
Other values (77)1363211.9%
 
ValueCountFrequency (%) 
250674.4%
 
327652.4%
 
447504.1%
 
546104.0%
 
634923.0%
 
741953.6%
 
847494.1%
 
933612.9%
 
10101548.8%
 
1162595.4%
 
ValueCountFrequency (%) 
1051380.1%
 
10412< 0.1%
 
10349< 0.1%
 
10210< 0.1%
 
10041< 0.1%
 
995< 0.1%
 
983< 0.1%
 
972< 0.1%
 
967< 0.1%
 
9522< 0.1%
 

product_width_cm
Real number (ℝ≥0)

Distinct94
Distinct (%)0.1%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean23.10772897
Minimum6
Maximum118
Zeros0
Zeros (%)0.0%
Memory size898.4 KiB

Quantile statistics

Minimum6
5-th percentile11
Q115
median20
Q330
95-th percentile45
Maximum118
Range112
Interquartile range (IQR)15

Descriptive statistics

Standard deviation11.74928356
Coefficient of variation (CV)0.5084568707
Kurtosis4.571625601
Mean23.10772897
Median Absolute Deviation (MAD)6
Skewness1.707890227
Sum2656996
Variance138.0456642
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
201237310.8%
 
11106089.2%
 
1589147.8%
 
1686297.5%
 
3078446.8%
 
1255554.8%
 
1354024.7%
 
1447324.1%
 
1841083.6%
 
4040423.5%
 
2539113.4%
 
1736263.2%
 
3533072.9%
 
2225412.2%
 
1924722.1%
 
2119681.7%
 
2317801.5%
 
2816861.5%
 
2615731.4%
 
2913241.2%
 
3212721.1%
 
2712271.1%
 
5012081.1%
 
3611851.0%
 
3311731.0%
 
Other values (69)1252310.9%
 
ValueCountFrequency (%) 
62< 0.1%
 
75< 0.1%
 
829< 0.1%
 
950< 0.1%
 
10820.1%
 
11106089.2%
 
1255554.8%
 
1354024.7%
 
1447324.1%
 
1589147.8%
 
ValueCountFrequency (%) 
1188< 0.1%
 
10514< 0.1%
 
1041< 0.1%
 
1022< 0.1%
 
1012< 0.1%
 
10043< 0.1%
 
981< 0.1%
 
971< 0.1%
 
952< 0.1%
 
9317< 0.1%
 

product_category_name_english
Categorical

HIGH CARDINALITY

Distinct71
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size898.4 KiB
bed_bath_table
11851 
health_beauty
9892 
sports_leisure
8876 
furniture_decor
8698 
computers_accessories
8048 
Other values (66)
67619 
ValueCountFrequency (%) 
bed_bath_table1185110.3%
 
health_beauty98928.6%
 
sports_leisure88767.7%
 
furniture_decor86987.6%
 
computers_accessories80487.0%
 
housewares72706.3%
 
watches_gifts61075.3%
 
telephony46474.0%
 
garden_tools45113.9%
 
auto43403.8%
 
toys42353.7%
 
cool_stuff39413.4%
 
perfumery35353.1%
 
baby31562.7%
 
electronics28242.5%
 
stationery25952.3%
 
fashion_bags_accessories21381.9%
 
pet_shop20141.8%
 
office_furniture17711.5%
 
consoles_games11791.0%
 
luggage_accessories11541.0%
 
construction_tools_construction9450.8%
 
home_appliances8160.7%
 
musical_instruments7080.6%
 
small_appliances6930.6%
 
Other values (46)90407.9%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length39
Median length13
Mean length12.99052912
Min length3

Overview of Unicode Properties

Unique unicode characters25
Unique unicode categories3 ?
Unique unicode scripts2 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
e18278812.2%
 
s1401479.4%
 
t1315838.8%
 
o1104167.4%
 
r1043467.0%
 
a1010686.8%
 
_1008586.8%
 
u771635.2%
 
c715614.8%
 
i624924.2%
 
h588353.9%
 
l584863.9%
 
b554723.7%
 
n483643.2%
 
f373262.5%
 
p344432.3%
 
d305662.0%
 
y297292.0%
 
g209741.4%
 
m201081.3%
 
w135510.9%
 
k21740.1%
 
v689< 0.1%
 
2298< 0.1%
 
x266< 0.1%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter139254793.2%
 
Connector Punctuation1008586.8%
 
Decimal Number298< 0.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e18278813.1%
 
s14014710.1%
 
t1315839.4%
 
o1104167.9%
 
r1043467.5%
 
a1010687.3%
 
u771635.5%
 
c715615.1%
 
i624924.5%
 
h588354.2%
 
l584864.2%
 
b554724.0%
 
n483643.5%
 
f373262.7%
 
p344432.5%
 
d305662.2%
 
y297292.1%
 
g209741.5%
 
m201081.4%
 
w135511.0%
 
k21740.2%
 
v689< 0.1%
 
x266< 0.1%
 

Most frequent Connector Punctuation characters

ValueCountFrequency (%) 
_100858100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
2298100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin139254793.2%
 
Common1011566.8%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e18278813.1%
 
s14014710.1%
 
t1315839.4%
 
o1104167.9%
 
r1043467.5%
 
a1010687.3%
 
u771635.5%
 
c715615.1%
 
i624924.5%
 
h588354.2%
 
l584864.2%
 
b554724.0%
 
n483643.5%
 
f373262.7%
 
p344432.5%
 
d305662.2%
 
y297292.1%
 
g209741.5%
 
m201081.4%
 
w135511.0%
 
k21740.2%
 
v689< 0.1%
 
x266< 0.1%
 

Most frequent Common characters

ValueCountFrequency (%) 
_10085899.7%
 
22980.3%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII1493703100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
e18278812.2%
 
s1401479.4%
 
t1315838.8%
 
o1104167.4%
 
r1043467.0%
 
a1010686.8%
 
_1008586.8%
 
u771635.2%
 
c715614.8%
 
i624924.2%
 
h588353.9%
 
l584863.9%
 
b554723.7%
 
n483643.2%
 
f373262.5%
 
p344432.3%
 
d305662.0%
 
y297292.0%
 
g209741.4%
 
m201081.3%
 
w135510.9%
 
k21740.1%
 
v689< 0.1%
 
2298< 0.1%
 
x266< 0.1%
 

payment_installments
Real number (ℝ≥0)

Distinct24
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.946844778
Minimum0
Maximum24
Zeros3
Zeros (%)< 0.1%
Memory size898.4 KiB

Quantile statistics

Minimum0
5-th percentile1
Q11
median2
Q34
95-th percentile10
Maximum24
Range24
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.781830288
Coefficient of variation (CV)0.9440029919
Kurtosis2.521086334
Mean2.946844778
Median Absolute Deviation (MAD)1
Skewness1.619280771
Sum338840
Variance7.738579749
MonotocityNot monotonic
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%) 
15726949.8%
 
21333211.6%
 
31149910.0%
 
478166.8%
 
1067615.9%
 
559165.1%
 
849654.3%
 
645183.9%
 
717661.5%
 
97110.6%
 
121640.1%
 
15910.1%
 
1838< 0.1%
 
2434< 0.1%
 
1125< 0.1%
 
2020< 0.1%
 
1319< 0.1%
 
1415< 0.1%
 
167< 0.1%
 
177< 0.1%
 
216< 0.1%
 
03< 0.1%
 
231< 0.1%
 
221< 0.1%
 
ValueCountFrequency (%) 
03< 0.1%
 
15726949.8%
 
21333211.6%
 
31149910.0%
 
478166.8%
 
559165.1%
 
645183.9%
 
717661.5%
 
849654.3%
 
97110.6%
 
ValueCountFrequency (%) 
2434< 0.1%
 
231< 0.1%
 
221< 0.1%
 
216< 0.1%
 
2020< 0.1%
 
1838< 0.1%
 
177< 0.1%
 
167< 0.1%
 
15910.1%
 
1415< 0.1%
 

payment_value
Real number (ℝ≥0)

Distinct28538
Distinct (%)24.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean172.7658347
Minimum0
Maximum13664.08
Zeros4
Zeros (%)< 0.1%
Memory size898.4 KiB

Quantile statistics

Minimum0
5-th percentile27.2815
Q161
median108.19
Q3189.5725
95-th percentile515.3055
Maximum13664.08
Range13664.08
Interquartile range (IQR)128.5725

Descriptive statistics

Standard deviation267.7545692
Coefficient of variation (CV)1.549812031
Kurtosis516.8024147
Mean172.7658347
Median Absolute Deviation (MAD)56.69
Skewness14.25757917
Sum19865306.74
Variance71692.50934
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
503380.3%
 
1002960.3%
 
202810.2%
 
77.572450.2%
 
351620.1%
 
73.341580.1%
 
301320.1%
 
116.941320.1%
 
56.781190.1%
 
155.141190.1%
 
107.781180.1%
 
251170.1%
 
651130.1%
 
99.91060.1%
 
86.151050.1%
 
451020.1%
 
87.641020.1%
 
67.51010.1%
 
105.28980.1%
 
31.75970.1%
 
64960.1%
 
45.09940.1%
 
37.77930.1%
 
64.1920.1%
 
65.71900.1%
 
Other values (28513)11147897.0%
 
ValueCountFrequency (%) 
04< 0.1%
 
0.016< 0.1%
 
0.032< 0.1%
 
0.052< 0.1%
 
0.082< 0.1%
 
0.091< 0.1%
 
0.13< 0.1%
 
0.112< 0.1%
 
0.131< 0.1%
 
0.145< 0.1%
 
ValueCountFrequency (%) 
13664.088< 0.1%
 
7274.884< 0.1%
 
6929.311< 0.1%
 
6922.211< 0.1%
 
6726.661< 0.1%
 
6081.546< 0.1%
 
4950.341< 0.1%
 
4809.442< 0.1%
 
4764.341< 0.1%
 
4681.781< 0.1%
 

encodedCategory
Real number (ℝ≥0)

Distinct71
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38.8608763
Minimum0
Maximum70
Zeros247
Zeros (%)0.2%
Memory size449.3 KiB

Quantile statistics

Minimum0
5-th percentile6
Q115
median42
Q360
95-th percentile70
Maximum70
Range70
Interquartile range (IQR)45

Descriptive statistics

Standard deviation22.51523877
Coefficient of variation (CV)0.5793806244
Kurtosis-1.357800642
Mean38.8608763
Median Absolute Deviation (MAD)23
Skewness-0.1621643987
Sum4468379
Variance506.935977
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
71185110.3%
 
4398928.6%
 
6588767.7%
 
3986987.6%
 
1580487.0%
 
4972706.3%
 
7061075.3%
 
6846474.0%
 
4245113.9%
 
543403.8%
 
6942353.7%
 
2039413.4%
 
5935353.1%
 
631562.7%
 
2628242.5%
 
6625952.3%
 
2821381.9%
 
6020141.8%
 
5717711.5%
 
1611791.0%
 
5311541.0%
 
179450.8%
 
448160.7%
 
567080.6%
 
636930.6%
 
Other values (46)90407.9%
 
ValueCountFrequency (%) 
02470.2%
 
12970.3%
 
22080.2%
 
324< 0.1%
 
43800.3%
 
543403.8%
 
631562.7%
 
71185110.3%
 
85570.5%
 
9610.1%
 
ValueCountFrequency (%) 
7061075.3%
 
6942353.7%
 
6846474.0%
 
67870.1%
 
6625952.3%
 
6588767.7%
 
64750.1%
 
636930.6%
 
621990.2%
 
612< 0.1%
 

TargetVar
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size898.4 KiB
1
114431 
0
 
553
ValueCountFrequency (%) 
111443199.5%
 
05530.5%
 

Days_to_deliver
Real number (ℝ≥0)

Distinct93322
Distinct (%)81.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23.84195696
Minimum2.008009259
Maximum155.135463
Zeros0
Zeros (%)0.0%
Memory size898.4 KiB

Quantile statistics

Minimum2.008009259
5-th percentile10.52446817
Q118.38960648
median23.24926505
Q328.4716985
95-th percentile38.62075231
Maximum155.135463
Range153.1274537
Interquartile range (IQR)10.08209201

Descriptive statistics

Standard deviation8.865754856
Coefficient of variation (CV)0.3718551656
Kurtosis4.949608203
Mean23.84195696
Median Absolute Deviation (MAD)5.056261574
Skewness0.9901440265
Sum2741443.58
Variance78.60160916
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
21.14825231630.1%
 
29.3772569438< 0.1%
 
20.4964120426< 0.1%
 
22.3094097224< 0.1%
 
20.0142824124< 0.1%
 
15.4203819424< 0.1%
 
23.4798842624< 0.1%
 
31.3779745424< 0.1%
 
25.499953722< 0.1%
 
16.4538310222< 0.1%
 
11.236516221< 0.1%
 
8.98451388921< 0.1%
 
36.5218518521< 0.1%
 
13.3536921320< 0.1%
 
24.233437520< 0.1%
 
28.609328720< 0.1%
 
17.3160763919< 0.1%
 
43.4000694416< 0.1%
 
20.0529629616< 0.1%
 
10.4121180615< 0.1%
 
55.9900578715< 0.1%
 
10.550937515< 0.1%
 
35.0938773115< 0.1%
 
25.1452314815< 0.1%
 
39.0608333315< 0.1%
 
Other values (93297)11442999.5%
 
ValueCountFrequency (%) 
2.0080092591< 0.1%
 
2.0104513891< 0.1%
 
2.0240740741< 0.1%
 
2.0260185191< 0.1%
 
2.0282870371< 0.1%
 
2.0423263891< 0.1%
 
2.0429861111< 0.1%
 
2.0457523151< 0.1%
 
2.0472916671< 0.1%
 
2.0514120371< 0.1%
 
ValueCountFrequency (%) 
155.1354631< 0.1%
 
149.59228012< 0.1%
 
146.24913191< 0.1%
 
144.89524311< 0.1%
 
140.06347222< 0.1%
 
116.09785881< 0.1%
 
109.34229172< 0.1%
 
106.9923381< 0.1%
 
101.01001161< 0.1%
 
99.132060191< 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

df_indexpricefreight_valuereview_scoreproduct_photos_qtyproduct_weight_gproduct_length_cmproduct_height_cmproduct_width_cmproduct_category_name_englishpayment_installmentspayment_valueencodedCategoryTargetVarDays_to_deliver
0058.913.2954.0650.028.09.014.0cool_stuff272.1920115.625671
1155.917.9654.0650.028.09.014.0cool_stuff173.8620127.505324
2264.918.3344.0650.028.09.014.0cool_stuff283.2320119.565359
3358.916.1754.0650.028.09.014.0cool_stuff375.0720123.223125
4458.913.2954.0650.028.09.014.0cool_stuff472.1920121.091204
5555.926.9354.0650.028.09.014.0cool_stuff182.8320127.366771
6664.938.5054.0650.028.09.014.0cool_stuff1103.4020124.124491
7758.918.1254.0650.028.09.014.0cool_stuff1153.7520131.292303
8858.917.8356.0530.030.09.014.0cool_stuff1153.7520131.292303
9955.935.7114.0650.028.09.014.0cool_stuff120.0020130.484502

Last rows

df_indexpricefreight_valuereview_scoreproduct_photos_qtyproduct_weight_gproduct_length_cmproduct_height_cmproduct_width_cmproduct_category_name_englishpayment_installmentspayment_valueencodedCategoryTargetVarDays_to_deliver
114974116571119.9027.1431.02400.020.030.030.0bed_bath_table5147.047132.241736
11497511657219.0015.7943.0150.016.09.014.0toys169.5869121.114560
11497611657319.0015.7943.0150.016.09.014.0toys169.5869121.114560
11497711657435.9916.6051.01850.020.020.020.0food_drink152.5937127.418333
114978116575146.9015.2012.0350.018.015.016.0home_construction1162.1048120.280139
114979116576129.9051.2051.06700.035.012.022.0garden_tools1181.1042124.163831
11498011657799.0013.5241.02300.037.030.020.0furniture_decor2112.523914.582650
114981116578736.0020.9153.0400.019.09.015.0watches_gifts1756.9170124.296493
114982116579229.9044.0242.02700.060.015.015.0sports_leisure7273.9265136.310336
11498311658043.0012.7951.0600.030.03.019.0bed_bath_table155.797118.291458